Data Assimilation and Visualization Short Course

Visualization practical

Wednesday 17th September 2014

Jon Blower

Lesson 3: Plotting data from NetCDF files

The netCDF4 library provides access to data stored in NetCDF format, which is a very common modern format for storing scientific data. NetCDF files hold multidimensional arrays of data, plus attributes that describe the data. Different communities use different sets of attributes, but the most common set of attributes are defined by the Climate and Forecast conventions, otherwise known as the "CF Conventions".

In this practical we won't have time to look into the details of NetCDF or the CF conventions, but we'll just look at some simple code to extract and plot data.

(Many of the techniques in this lesson can apply to other file formats that store multidimensional arrays too, such as HDF.)

Objectives

  1. Learn a little about the structure of NetCDF files
  2. Learn how to extract multidimensional data from NetCDF files
  3. Learn how to create map plots and other kinds of plots

Creating map plots

First, we'll import the NetCDF library:


In [1]:
import netCDF4

We've put some data files in the data directory. Let's open one of them. This is some ocean potential temperature data from NCEP's Global Ocean Data Assimilation System.


In [2]:
nc = netCDF4.Dataset('../data/pottmp.2014.1time.nc')

We can see what variables we have in the dataset with nc.variables. This returns a dictionary of Variable objects, the keys of which give the identifiers of the variables:


In [3]:
nc.variables.keys()


Out[3]:
[u'date', u'lat', u'level', u'lon', u'pottmp', u'time', u'timePlot']

(The leading 'u' means that these are Unicode strings. Don't worry about this, it just means that the strings could contain international characters like 

ü or é if we wanted.)

We want to plot the potential temperature variable. Let's get a handle to this variable:


In [4]:
pottmp = nc.variables['pottmp']

We can find its units ("units" is an attribute of the vorticity variable):


In [5]:
pottmp.units


Out[5]:
u'K'

(Which means "Kelvin" of course.) We can find its long name, which is a human-readable string that describes what the variable represents:


In [6]:
pottmp.long_name


Out[6]:
u'Potential temperature'

Some datasets provide standard names, which are more precise description of the variable. This (unfortunately) isn't provided in this dataset, but if it were, you could have found it using sst.standard_name.

You can look up the definition of the standard name in the CF conventions' table of standard names if you want. (Note that in this particular case the standard name may not be quite correct.)

(Note that not all NetCDF files provide long names or standard names for their variables, but it's good practice to do so.)

Now let's extract some data from the variable. When we read data from a NetCDF variable, we get a Numpy array that we can manipulate in exactly the same way as any other Numpy array.

It's usually a good idea to check how big the data array will be, since they can be very large and can contain any number of dimensions. The shape attribute let's us find this, without actually reading any data at this point:


In [7]:
pottmp.shape


Out[7]:
(1, 40, 418, 360)

So the temperature variable is a four-dimensional array, but what are the dimensions? We can find this out too:


In [8]:
pottmp.dimensions


Out[8]:
(u'time', u'level', u'lat', u'lon')

So there is 1 value of time, 40 vertical levels, 418 latitude values and 360 of longitude. We're just going to plot a simple map, so we're going to slice the data at the first time value and level:


In [9]:
data = pottmp[0,0]
data.shape


Out[9]:
(418, 360)

Now data is a 2D array of all the data at the first timestep and vertical level. We can plot the data using matplotlib's contourf function, which creates a filled contour plot. In this case we're using 20 levels of colour.


In [10]:
import matplotlib.pyplot as plt

# These two lines are needed to make matplotlib plot figures inline and at a decent size.
# They are not needed in scripts.
%matplotlib inline
plt.rcParams['figure.figsize'] = (12.0, 8.0)

# These lines do the actual plotting
plt.contourf(data, 20)
plt.show()


This plot isn't very satisfactory. Apart from the fact that there are no labels or colour bar on the plot to indicate what we are looking at, the axis values are wrong. We can correct this by finding the variables that hold these values.

We know from above that there is a dimension called 'lat' and a dimension called 'lon'. The variables that hold the values of latitude and longitude have the same names, so we can get handles to the variable objects like this:


In [11]:
lon = nc.variables['lon']
lat = nc.variables['lat']

Now we can read the actual values of longitude and latitude. Note that "[:]" means "read all the values from the variable".


In [12]:
lonvals = lon[:]
latvals = lat[:]

Now we can make a better plot. Note how we use the long_name and units of the variables for the title and axis labels. Also, we are using a more appropriate colour scale than the default rainbow scale (question: why is this scale more appropriate?).


In [13]:
plt.title (pottmp.long_name + ' (' + pottmp.units + ')')
plt.xlabel(lon.long_name    + ' (' + lon.units    + ')')
plt.ylabel(lat.long_name    + ' (' + lat.units    + ')')
plt.contourf(lonvals, latvals, data, 20, cmap=plt.get_cmap('YlGnBu_r'))
plt.colorbar()
plt.show()


(By the way, you can resize the plot by dragging its bottom right-hand corner. Try it!)

Note that the axis labels and title are automatically read from the data file, which is neat.

Plot a vertical profile

We can create visualizations of many different views through the data. Recall that pottmp is a four-dimensional variable with dimensions (time, level, lat, lon). Let's say we want to take a vertical profile at a certain point in the data (i.e. we want to find temperature as a function of depth). How can we do this?

First, of course, we have to extract the data from the variable. We're going to pick a point roughly in the middle of the above plot. Recall that we have 360 values of longitude and 418 of latitude. So let's pick the longitude at index 180 and the latitude at index 214 (i.e. roughly in the middle). We're going to pick the first time value (there is only one anyway). We can extract all the values of temperature with the following command:


In [14]:
profile = pottmp[0,:,214,180]

You should read this as "give me temperature values for all values of depth at the 214th latitude index and the 180th longitude index". (Recall that Python array indices start at zero, not one.)

If we're going to plot a graph, we also need the values of depth:


In [15]:
depthvar = nc.variables['level']
depthvals = depthvar[:]

Now we can create a plot of temperature versus depth:


In [16]:
plt.plot(depthvals, profile)
plt.xlabel(depthvar.long_name + ' (' + depthvar.units + ')')
plt.ylabel(pottmp.long_name   + ' (' + pottmp.units   + ')')
plt.title('Vertical profile: first attempt')
plt.show()


Well, this is OK, but to be more intuitive, we would probably like the depth axis to be the vertical axis, so we should plot depth versus temperature, not the other way around. However, this would result in depth increasing vertically, which is not what we want. So we have to insert some code to reverse the depth axis. Here's how:


In [17]:
plt.plot(profile, depthvals)
plt.xlabel(pottmp.long_name   + ' (' + pottmp.units   + ')')
plt.ylabel(depthvar.long_name + ' (' + depthvar.units + ')')

plt.gca().invert_yaxis() # Gets the axes of the plot and reverses the y axis

plt.title('Vertical profile: this is better')
plt.show()


This is what we expect to see in a typical deep-ocean location: a shallow, warm, well-mixed layer at the surface followed by a steep gradient (the thermocline) to cold temperatures at depth.

Exercises

A script to generate the map plot above can be found in temperature.py. You can use this as the basis for your experiments in these exercises.

  1. Adjust the code to create a latitude-longitude plot at a different vertical level within the dataset.
  2. How can you improve the title of the plot so that it reads "Potential temperature at x m depth (K)" where x is the real depth in metres of the slice you are showing?
  3. Create a transect plot, which is a plot of potential temperature versus distance along a horizontal line. You could pick a line running north-south through the Pacific Ocean, for instance.
  4. Similarly, create a vertical section plot, which is a 2D plot of depth versus horizontal distance, using colour to indicate temperature values. You can use the same horizontal line as in the previous exercise.
  5. The data folder contains output from Phil Browne's barotropic vorticity model in a file called barotropic.nc. Create some plots (e.g. maps and transects) using this data file. The file has no vertical dimension so you can't create vertical profiles or sections.
    • The above data are from a model that is not really referenced to longitude and latitude (in fact the domain is toroidal, meaning that the boundary conditions are periodic on all edges of the domain). So the values of latitude and longitude don't actually mean anything physical in this dataset.

Further reading

NetCDF tutorial from Unidata: http://www.unidata.ucar.edu/software/netcdf/docs/netcdf-tutorial.html

The NetCDF operators (http://nco.sourceforge.net/) are an extremely useful set of command-line tools for manipulating data in NetCDF files. (Some of the data used in this practical were prepared using these tools. See the file ReadMe.txt in the data directory.)


In [17]: